Search CORE

84 research outputs found

SQL Query Completion for Data Exploration

Author: Guilly Marie Le
Petit Jean-Marc
Scuturici Vasile-Marian
Publication venue
Publication date: 07/02/2018
Field of study

Within the big data tsunami, relational databases and SQL are still there and remain mandatory in most of cases for accessing data. On the one hand, SQL is easy-to-use by non specialists and allows to identify pertinent initial data at the very beginning of the data exploration process. On the other hand, it is not always so easy to formulate SQL queries: nowadays, it is more and more frequent to have several databases available for one application domain, some of them with hundreds of tables and/or attributes. Identifying the pertinent conditions to select the desired data, or even identifying relevant attributes is far from trivial. To make it easier to write SQL queries, we propose the notion of SQL query completion: given a query, it suggests additional conditions to be added to its WHERE clause. This completion is semantic, as it relies on the data from the database, unlike current completion tools that are mostly syntactic. Since the process can be repeated over and over again -- until the data analyst reaches her data of interest --, SQL query completion facilitates the exploration of databases. SQL query completion has been implemented in a SQL editor on top of a database management system. For the evaluation, two questions need to be studied: first, does the completion speed up the writing of SQL queries? Second , is the completion easily adopted by users? A thorough experiment has been conducted on a group of 70 computer science students divided in two groups (one with the completion and the other one without) to answer those questions. The results are positive and very promising

arXiv.org e-Print Archive

HAL

Hal-Diderot

Modélisation sémantique des bases de données d'inventaires en cycle de vie

Author: BERTIN Benjamin
PINON Jean-Marie
SCUTURICI Vasile-Marian
Publication venue
Publication date: 01/01/2013
Field of study

L'analyse des impacts environnementaux de la production de biens et de services est aujourd'hui devenue un enjeu majeur. L'analyse en cycle de vie est la méthode consacrée pour modéliser les impacts environnementaux des activités humaines. L'inventaire en cycle de vie, qui est l'une des étapes de cette méthode, consiste à décomposer les activités économiques en processus interdépendants. Chaque processus a des impacts environnementaux et la composition de ces processus nous donne l'impact cumulé des activités étudiées. Plusieurs entreprises et agences gouvernementales fournissent des bases de données d'inventaires en cycle de vie pour que les experts puissent réutiliser des processus déjà étudiés lors de l'analyse d'un nouveau système. L'audit et la compréhension de ces inventaires nécessite de s'intéresser à un très grand nombre de processus et à leurs relations d'interdépendance. Ces bases de données peuvent comporter plusieurs milliers de processus et des dizaines de milliers de relations de dépendance. Pour les experts qui utilisent les bases de données d'inventaire en cycle de vie, deux problèmes importants sont clairement identifiés : - organiser les processus pour avoir une meilleure compréhensibilité du modèle ; - calculer les impacts d'une modélisation (composition de processus) et, le cas échéant, détecter les raisons de la non convergence du calcul. Dans cette thèse, nous : - mettons en évidence de l'existence de similarités sémantiques entre les processus et leurs relations d'interdépendance et proposons une nouvelle approche pour modéliser les relations d'interdépendance entre les processus d'une base de données d'inventaire. Elle se base sur un étiquetage sémantique des processus à l'aide d'une ontologie et une modélisation multi-niveaux des relations d'interdépendance entre les processus. Nous étudions aussi deux approches déclaratives d'interaction avec ce modèle multi-niveau. - étudions les différentes méthodes de calcul des impacts basées sur des notions classiques d'algèbre linéaire et de théorie des graphes. Nous étudions aussi les conditions de non convergence de ces méthodes en présence de cycle dans le modèle des relations de dépendances. Un prototype implémentant cette approche a montré des résultats probants sur les cas étudiés. Nous avons réalisé une étude de cas de ce prototype sur les processus de production d'électricité aux États-Unis extraits de la base de données d'inventaire en cycle de vie de l'agence environnementale américaine. Ce prototype est à la base d'une application opérationnelle utilisée par l'entreprise.Environmental impact assessment of goods and services is nowadays a major challenge for both economic and ethical reasons. Life Cycle Assessment provides a well accepted methodology for modeling environmental impacts of human activities. This methodology relies on the decomposition of a studied system into interdependent processes in a step called Life Cycle Inventory. Every process has several environmental impacts and the composition of those processes provides the cumulated environmental impact for the studied human activities. Several organizations provide processes databases containing several thousands of processes with their interdependency links that are used by LCA practitioners to do an LCA study. Understanding and audit of those databases requires to analyze a huge amount of processes and their dependency relations. But those databases can contain thousands of processes linked together. We identified two problems that the experts faces using those databases: - organize the processes and their dependency relations to improve the comprehensibility; - calculate the impacts and, if it is not possible, find why it is not feasible. In this thesis, we: - show that there are some semantic similarities between the processes and their dependency relations and propose a new way to model the dependency relations in an inventory database. In our approach, we semantically index the processes using an ontology and we use a multi-layers model of the dependency relations. We also study a declarative approach of this multi-layers approach; - propose a method to calculate the environmental impacts of the processes based on linear algebra and graph theory, and we study the conditions of the feasibility of this calculation when we have a cyclic model. We developed a prototype based on this approach that showed some convincing results on different use cases. We tested our prototype on a case study based on a data set extracted from the National Renewable Energy restricted to the electricity production in the United-States.VILLEURBANNE-DOC'INSA-Bib. elec. (692669901) / SudocSudocFranceF

OpenGrey Repository

Gestion continue des données dans les applications pervasives

Author: Scuturici Vasile-Marian
Publication venue: HAL CCSD
Publication date: 13/12/2013
Field of study

Ce mémoire présente mes contributions à la gestion continue des données dans les applications pervasives. Dans l'introduction j'ai retracé les origines de la gestion continue des données dans un environnement pervasif. J'ai présenté mes contributions à ce sujet, avec des applications à des exemples concrets. Une partie importante du document est dédiée au projet que je souhaite réaliser dans les années à venir : l'intégration de primitives de fouille de données dans l'optimisation de requêtes continues en bases de données, avec une application privilégiée : la robotique. Pour "convaincre" que cette proposition est viable, j'ai introduit d'une manière très schématique la notion de requête continue autonome. La difficulté majeure concerne l'optimisation de ces requêtes. Je considère cette optimisation comme intimement liée à l'intégration des techniques de fouille de données au niveau du moteur de requêtes. Le document présente quelques applications aux problèmes d'actualité, en utilisant des données réelles pour expérimenter et valider chaque proposition théorique. Ces données sont issues de collaborations avec des entreprises de la région, dans des domaines variés : vidéosurveillance, réseaux de senseurs, domotique, flux de données, bases de données

Hal-Diderot

Managing Distributed Service Environments: A Data-oriented Approach

Author: Gripay Yann
Scuturici Vasile-Marian
Publication venue: HAL CCSD
Publication date: 07/06/2010
Field of study

National audienceManaging dynamic and distributed computing environments, a.k.a. pervasive or ubiquitous environments, is currently a major issue in many application domains. The abstraction of functionality as services, representing sensors, actuators, and all other devices ranging from small mobile handsets to powerful servers, is a common way to address this issue. However, managing a great number of dynamic distributed services is still a difficult issue.In this paper, we present a data-oriented approach for distributed service environments. It relies on a model that homogeneously represents services producing, storing and/or consuming data and data streams, and providing computation and actuator functionality. This model enables the extension of one-shot and continuous query processing techniques to manage distributed service environments. We also describe a protocol of RESTful web services implementing this model, and service-oriented continuous query processing techniques on top of it

Hal-Diderot

PaTHOS: Part-based Tree Hierarchy for Object Segmentation

Author: Miguet Serge
Scuturici Mihaela
Scuturici Vasile-Marian
Suta Loreta
Publication venue: HAL CCSD
Publication date: 27/08/2013
Field of study

International audienceThe problem we address in this paper is the segmentation and hierarchical grouping in digital images. In terms of image acquisition protocol, no constraints are posed to the user. At first, a histogram thresholding provides numerous segments where a homogeneity criterion is respected. Segments are merged together using similarity properties and aggregated in a hierarchy based on spatial inclusions. Shape and color features are extracted on the produced segments. Tests performed on Oxford Flower 17 show that our method outperforms a similar one and allow the relevant object selection from the hierarchy. In our case, this approach represents the first stage towards flower variety identification

HAL

Addressing resource usage in stream processing systems: sizing window effect

Author: Scuturici Vasile-Marian
Surdu Sabina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/09/2011
Field of study

International audienceStream processing systems compute continuous queries over increasingly large volumes of data, as monitoring applications emerge in a broad array of fields. These systems need to satisfy application-dependent constraints, one of the most important ones being accuracy demands and query response times. As system resources are limited, various query optimization techniques are proposed. To the best of our knowledge, none of the existing methods takes into account the size of the window, which is input to a query. We believe resource usage can be tackled with a novel approach, that attempts to compute an optimal window size for a given continuous query, thereby placing a minimal upper bound on the resource consumption for that query

HAL

Hal-Diderot

Positioning Support in Pervasive Environments

Author: Dejene Ejigu
Scuturici Vasile-Marian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/06/2006
Field of study

International audienceIn order to implement reactive and proactive functionalities in a pervasive environment, contextual data must be processed. One of the most important features of the context is the position of the users and the devices. In this paper, we describe a method to determine the position of a WiFi enabled device. The prediction is based on the signal strength of the available access points. The prediction model is built from a database containing the signal strength measured in some known locations. The result is the name of the room/office where the device is localised. We also present a usage scenario, in which the user/device position is used to start proactive actions in our pervasive service environment called PerS

HAL

Hal-Diderot

Interest-Awareness for Information Sharing in MANETs

Author: Brunie Lionel
Negash Addisalem
Scuturici Vasile-Marian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/05/2010
Field of study

International audienceThe progress in mobile devices and wireless communication technologies magnifies the importance of opportunistic networks like Mobile Ad-hoc NETworks (MANETs). In MANETs, information sharing is performed by distributing advertisements and queries. In order not to load the environment with unnecessary traffic, file advertisement and query resolution should be performed according to the interests of users. In this paper, we propose algorithms to identify and estimate users’ interests. Experimentations are conducted to evaluate the performanace of the proposed algorithms on a mobile phone and a PC

Hal-Diderot

Expressing and Interpreting User Intention in Pervasive Service Environments

Author: Bihler Pascal
Brunie Lionel
Scuturici Vasile-Marian
Publication venue: 'Digital Information Research Foundation'
Publication date: 01/05/2006
Field of study

International audienceThe introduction of pervasive computing environments enforce new ways of human-machine-interaction. The welldefined interaction interfaces will make place for other, more intuitive ways of interaction. In a pervasive service environment, the system middleware should take care of capturing the users expression of an action intention, solving ambiguousness in this expression, and executing the final pervasive action This article introduces the Pervasive Service Action Query Language (PsaQL), a language to formalize the description of a user intention using composed pervasive services. It presents the next steps of intention treatment in a pervasive service environment: A mathematical model is given, which helps to express the algorithms performing translation of the user intention into an executable action. To implement such algorithms, a suitable object- oriented model representing actions is introduced. In the scope of PERSE, a pervasive service environment developed by our research group, general evaluation metrics for such algorithms are identified, a prototype has been developed and first benchmark results are presented in this article

Hal-Diderot